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(54) Method for generating personality patterns and for synthesizing speech 



(57) To mimic the speaking behavior of a given 
speaker, a method for generating personality patterns 



in particular for synthesizing speech is proposed in 
which acoustical as well as non -acoustical speech fea- 
tures (SF) are extracted from a given speech input (SI). 
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Description 

[0001] The present invention relates to a method for 
generating personality patterns and to a method for syn- 
thesizing speech. 5 
[0002] Nowadays, a large variety of equipment and 
appliances employ man-machine dialogue systems to 
ensure an easy and reliable use by a human user. These 
man-machine dialogue systems are enabled to receive 
and consider users' utterances, in particular orders and/ 10 
or inquiries, and to react and respond in an appropriate 
way. Nevertheless, current speech synthesis systems 
involved in such man-machine dialogue systems suffer 
from a lack of personality and naturalness. Although the 
systems are enabled to deal with the context of the sit- *s 
uation in an appropriate way, the prepared and output 
speech of the dialogue system often sounds monotoni- 
cally, machine-like, and not embedded into the particu- 
lar situation. 

[0003] It is an object of the present invention to pro- 20 
vide a method for generating personality patterns in par- 
ticular for synthesizing speech and a method for synthe- 
sizing speech in which naturalness of the speech and 
its features can be realized. 

[0004] The object is achieved by a method for gener- 25 
ating personality patterns, in particular for synthesizing 
speech, with the features of claim 1 . Furtheron, the ob- 
ject is achieved by a method for synthesizing speech 
according to the characterizing features of claim 11 . A 
system and a computer program product for carrying out 30 
the inventive methods are the subject-matter of claims 
14 and 15, respectively. Preferred embodiments of the 
inventive methods are within the scope of the dependent 
subclaims. 

[0005] In the inventive method for generating person- 35 
ality patterns, in particular for synthesizing speech, a 
speech input is received and/or preprocessed. From the 
speech input acoustical and/or non-acoustical speech 
features are extracted. Based on the extracted speech 
features and/or on models and/or parameters thereof, 40 
a personality pattern is generated and/or stored. 
[0006] It is therefore a basic idea of the present inven- 
tion to extract acoustical and alternatively or simultane- 
ously non-acoustical speech features from a received 
speech input. The speech features are then directly or 45 
indirectly used to construct a personality pattern which 
can lateron be used to reconstruct a speech output with 
the mimic of the speech input and its speaker. The 
speech features are therefore parameterized or mod- 
eled and included or described in certain models or so 
units. 

[0007] According to an embodiment of the inventive 
method for generating personality patterns, online input 
speech and/or speech of a speech data base for at least 
one given speaker are used for receiving said speech 55 
input. Using a speech data base enables a system in- 
volving the inventive method to generate the personality 
patterns in advance of an application. That means that, 



before the system is applied for example in an speech 
synthesizing unit, a speech model for a single speaker 
or for a variety of speakers can be constructed. Within 
the application of the inventive method it is also possible 
to construct the personality patterns during the applica- 
tion in a speech synthesizing unit in a real time or online 
manner, so as to adapt a speech output generated in a 
dialogue system during the application and/or during the 
dialogue with the user. 

[0008] It is an aspect of the present invention to use 
a large variety of features from the speech input so as 
to model the personality patterns as good as possible 
to achieve in an application of a dialogue system a par- 
ticular natural responding speech output. 
[0009] It is therefore an aspect of a further embodi- 
ment of the present invention to use prosodic features, 
voice quality features, global statistic and/or spectral 
properties : and/or the like as acoustical features. 
[0010] Within the class of prosodic features, pitch, 
pitch range, intonation attitude, loudness, speaking rate, 
phone duration, speech element duration features, and 
or the like can be employed. 

[0011] Within the class of voice quality features, pho- 
nation type, articulation manner, voice timbre features, 
and/or the like can be employed. 

[0012] in the class of non -acoustical features, contex- 
tual features and/or the like may be important in accord- 
ance to a further advantageous embodiment of the 
present invention. In particular, syntactical, grammati- 
cal, semantical features, and/or the like can be used as 
contextual features. 

[0013] As a human speaker has distinct preferences 
in constructing sentences, phrases, word combinations, 
and/or the like, according to a further preferred embod- 
iment of the present invention within the class of non- 
acoustical features statistical features on the usage, dis- 
tribution, and/or probability of speech elements - such 
as words, subword units, syllables, phonemes, phones, 
and/or the like - and/or combinations of them within said 
speech input can be used. Additional sentence, phrase, 
word combination preferences can be evaluated and in- 
cluded into said personality pattern. 
[0014] To prepare for the extraction of contextual fea- 
tures or the like, a process of speech recognition is pref- 
erably carried out within the inventive method. 
[0015] Alternatively or additionally, a process of 
speaker identification and/or adaptation can be per- 
formed, in particular so as to increase the matching rate 
of the feature extraction and/or of the recognition rate 
of the process of speech recognition. 
[0016] In the inventive method for synthesizing 
speech, in particular for a man -machine dialogue sys- 
tem, the inventive method for generating personality 
patterns is employed. 

[0017] According to a further embodiment of the in- 
ventive method for synthesizing speech, the method for 
generating personality patterns is essentially carried out 
in a preprocessing step, in particular based on a speech 
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data base or the like. 

{0018] Alternatively or additionally, the method for 
generating personality patterns can be carried out and/ 
or continued in a continuous, real time, or online man- 
ner. This enables a system involving said method for 
synthesizing speech to adapt its speech output in ac- 
cordance to the received input during the dialogue. 
[001 9] Both of the methods for generating personality 
patterns and/or for synthesizing speech can be config- 
ured to create a personality pattern or a speech output 
which is in some sense complementary to the person- 
■_■ ality pattern or character assigned to the speaker of the 
*> ^speech input. That means, for instance, that in the case 
, of an emergency call system for activating ambulance 
or fire alarm services the speaker of the speech input 
might be excited and/or confused. It might therefore be 
necessary to calm down the speaking person and this 
can be achieved by creating a personality pattern for the 
speech synthesis reflecting a strong and confident and 
safe character. Additionally, it might also be possible to 
:■' construct personality patterns for the synthesized 
speech output which reflects a gender which is comple- 
mentary to the gender of the speaker of the speech in- 
put, i. e. in the case of a male speaker, the system might 
respond as a female speaker so as to make the dialogue 
most convenient for the speaking person. 
[0020], it is a further aspect of the present invention to 
provide a system, an apparatus, a device, and/or the 
like for generating personality patterns and/or for syn- 
thesizing speech which is in each case capable of per- 
forming and/or realizing the inventive methods for gen- 
erating personality patterns and/or for synthesizing 
speech and/or its steps. 

[0021] According to a further aspect of the present in- 
vention, a computer program product is provided, com- 
prising computer program means which is adapted to 
perform and/or to realize the inventive method for gen- 
erating personality patterns and/or for synthesizing 
speech and/or the steps thereof when it is executed on 
a computer, a digital signal processing means, and/or 
the like. 

[0022] The aspects of the present invention will be- 
come more elucidated taking into account the following 

remarks: 

[0023] After the identification of a speaker, both his 
relevant voice quality features and his speech itself - as 
described by any units, such as words, syllables, di- 
phones, sentences, and/or the like - is automatically ex- 
tracted according to the invention. Also information 
about preferred sentence structure and word usage are 
extracted and used to create a speech synthesis system 
with those characteristics in a completely unsupervised 
way. 

[0024] The starting point for these inventive concepts 
is the lack of personality of current speech synthesis 
systems. Prior art systems are developed with text-to- 
speech (TTS) operation in mind, where intelligibility and 
naturalness of speech is the most important. For dia- 



logue systems, however, the personality of the dialogue 
partner is essential, too. Depending on the personality 
of the artificial dialogue partner, the speaker may be in- 
terested in continuation of the dialogue or not. Thus, 
5 adding a personality pattern to the speech generated by 
the device may be crucial for the success of the dialogue 
device. 

[0025] Therefore, it is proposed to collect and store 
all information about speaking style of the person mak- . 
10 ing conversation with the system or device and to use 
said infonmation to modify the speaking style of the de- 
vice. 

[0026] The proposed methods can be used to mimic 
the actual speaker talking to the device but also to equip 

15 . the device with some different personalities, e. g. gath- 
ered from the speaking style of famous people, movie 
stars, or the like. This can be very attractive for potential 
customers. The proposed system can be used not only 
to mimic speaker's behavior but more generally to con- 

20 trol the dialogue depending on changing speaking style 
and emotions of the human partner. 
[0027] The collection of features describing the 
speaker's personality can be done on different levels 
during the conversation of the human by a dialogue unit. 

25 in order to mimic the speaker's voice, the speech signal 
has to be recorded and segmented into phones, di- 
phones, and/or into other speech units or speech ele- 
ments in dependence on the speech synthesis method 
used in the system. 

30 [0028] Prosodic features like pitch, pitch range, atti- 
tude of sentence intonation (monotonous or effected), 
loudness, speaking rate, durations of phones : and/or 
the like can be collected to characterize the speaker's 
prosody. 

35 [0029] Voice quality features like phonation type, ar- 
ticulation manner, voice timbre, and/or the like can be 
automatically extracted from the collected speech data. 
[0030] Speaker identification or a speaker identifica- 
tion module are necessary for a proper function of the 

40 system. 

[0031] The system can also collect all the words rec- 
ognized from the adherences spoken by the speaker 
and to generate and evaluate statistics on the usage. 
This can be used to find the most frequent phrases, 

45 words used by a given speaker, and/or the like. Also syn- 
tactic information gathered from the recognized phrases 
can enhance the quality of personality description. 
[0032] After all necessary information has been col- 
lected, the dialogue system can adjust parameters and 

so units of acoustic output - for example the synthesized 
waveforms or the like - and modes of text generation to 
suite the recognized speaker's characteristic. 
[0033] The parameterized personality can be stored 
for future use or can be preprogrammed in the dialogue 

55 device. The information can be used to recognize 
speakers and to change the personality of the system 
depending on the user's preference or mood, for exam- 
ple in case of a system with a built-in emotion recogni- 
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tion engine. 

[0034] The personality can be changed according to 
the user's wish, preprogrammed sequence or depend- 
ing on changing speaker's style and emotions of the 



[0035] The main advantage of such a system is the 
possibility to adapt the dialogue to the given speaker, 
make the dialogue more attractive, and/or the like. The 
possibility to mimic certain speakers or to switch be- 
tween different personalities or speaking styles can be 
very entertaining and attractive for the user. 
[0036] In the following, further advantages and as- 
pects of the present invention will be described taking 
reference to the accompanying figure. 

Fig. 1 is a schematical block diagram describing a 
preferred embodiment of a method for synthe- 
sizing speech employing an embodiment of 
the inventive method for generating personal- 
ity patterns. 

[0037] The schematical block diagram of Fig. 1 shows 
a preferred embodiment of the inventive method for a 
synthesizing speech employing an embodiment of the 
inventive method for generating personality pattern from 
a given received speech input SI. 
[0038] In step S1 , speech input S1 is received. In a 
first section S1 0 of the inventive method forsynthesizing 
speech , non -acoustic features are extracted from the re- 
ceived speech input SI. In a second section S20 of the 
inventive method for synthesizing speech, acoustical 
features are extracted from the received speech input 
SI. The sections S10 and S20 can be performed paral- 
lel or sequentially on a given device or apparatus. 
[0039] In the first section S10 for extracting non- 
acoustical features from the speech input S1 in a first 
step S11, speech parameters are extracted from said 
speech input SI. In a second step S 12, the speech input 
S 1 is fed into a speech recognizer to analyze the content 
and the context of the received speech input SI. 
[0040] Based on the recognition result, in a following 
step S13 contextual features are extracted from said 
speech input S1 , in particular syntactical, semantical, 
grammatical, and statistical information on particular 
speech elements are obtained. 

[0041] In the embodiment of Fig. 1 , the second sec- 
tion S20 of the inventive method for synthesizing speech 
consists of three steps S21 , S22, and S23 to be per- 
formed independently from each other. 
[0042] In the first step S21 of the second section S20 
for extracting acoustical features, prosodic features are 
extracted from the received speech input SI. Said pro- 
sodic feature may comprise features of pitch, pitch 
range, intonation attitude, loudness, speaking rate, 
speech element duration, and/or the like. 
[0043] In a second step S22, voice quality features 
are extracted from the given received speech input SI, 
for instance phonation type, articulation manner, voice 



timbre features, and/or the like. 
[0044] Finally, in a thirdand final step S23 of the sec- 
ond section S20, statistical/spectral features are ex- 
tracted from the given speech input SI. 
5 [0045] The non -acoustical features and the acoustical 
features obtained from sections S10 and S20 are 
merged in a following postprocessing step S30 to de- 
tect, model, and store a personality pattern PP for the 
given speaker. 

w [0046] The data describing the personality pattern PP 
for the current speaker are fed into a following step S40 
which includes the steps of speech synthesis, text gen- 
eration, and dialogue managing from which a respon- 
sive speech output SO is generated and then output in 

.15. a final step S50. 



Claims 

20 1 . Method for generating personality patterns, in par- 
ticular for synthesizing speech, wherein: 

speech input (SI) is received and/or preproc- 
essed, 

25 - acoustical and/or non-acoustical speech fea- 
tures (SF) are extracted from said speech input 
(SI), 

based on the extracted speech features (SF) or 
on models or parameters thereof a personality 
30 pattern (PP) is generated and/or stored. 

2. Method according to claim 1 , wherein online input 
speech and/or speech of a speech data base for at 
least one given speaker are used for receiving said 

35 speech input (SI). 

3. Method according to anyone of the preceding 
claims, wherein prosodic features, voice quality fea- 
tures, global statistical, and/or spectral properties, 

40 and/or the like are used as acoustical features. 

4. Method according to claim 3, wherein pitch, pitch 
range, intonation attitude, loudness, speaking rate, 
phone duration, speech element duration features, 

45 and/or the like are used as prosodic features. 

5. Method according to anyone of the claims 3 or 4, 
wherein phonation type, articulation manner, voice 
timbre features, and/or the like are used as voice 

so quality features. 

6. Method according to anyone of the preceding 
claims, wherein contextual features, and/or the like 
are used as said non -acoustical features. 

Method according to claim 6, wherein syntactical, 
grammatical, semantical features, and/or the like 
are used as contextual features. 
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. 8. Method according to anyone of the claims 6 or 7, 
wherein statistical features on the usage, distribu- 
tion, and/or probability of speech elements - such 
as words, subword units, syllables, phonemes, . 
phones, and/or the like - and/or combinations of 5 
them within said speech input (SI) are used as non- 
acoustical features. 

9. Method according to anyone of the preceding 
claims, wherein a process of speech recognition is 1Q 
carried out, in particular to prepare the extraction of 
contextual features and/or the like. 

10. Method according to anyone of the preceding 
claims, wherein a process of speaker identification is 
and/or adaptation is performed, in particular so as 

to increase the matching rate of the feature extrac- 
tion and/or of the recognition rate of the process of 
speech recognition. 



11. Method for synthesizing speech, in particular for a 
man-machine dialogue system, wherein the meth- 
od for generating personality patterns according to 
anyone of the claims 1 to 10 is employed. 

12. Method according to claim 11 , wherein the method 
for generating personality patterns is essentially 
carried out in a preprocessing step, in particular 
based on a speech data base or the like. 

1 3. Method according to anyone of the claims 1 1 or 1 2, 
wherein the method for generating personality pat- 
terns is carried out and/or continued in a continu- 
ous, real time, or online manner. 
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14. System for generating personality patterns and/or 
for synthesizing speech which is capable of per- 
forming and/or realizing the method for generating 
personality patterns according to anyone of the 
claims 1 to 10 and/or the method for synthesizing *o 
speech according to anyone of the claims 11 to 13 
and/or the steps thereof. 

15. Computer program product, comprising computer 
program means adapted to perform and/or to real- 45 
ize the method for generating personality patterns 
according to anyone of the claims 1 to 1 0 and/or the 
method for synthesizing speech according to any- 
one of the claims 11 to 13 and/or the steps thereof 
when it is executed on a computer, a digital signal so 
processing means, and/or the like. 



55 



5 



EP 1 271 469 A1 



SI 

Receive Speech J/ 
Input SI 



non-acoustical S10 
features / 
extraction 



Sll 



Extract Spe- 
ech Parame- 
ters from SI 



S12 



Recognize 
Speech In- 
put SI 



S13 



Extract Con- 
textual Fea- 
tures from 
SI: 

- syntactical 

- sematlcal 

• grammatical 

- statistics on 
speech dements 



««•«•••••«•«•« 



S20 acoustical 

/ features 



S21 



Extract Prosodic 
Features from 
SI: 

pitch, pitch range, 
intonation attitude, 
loudness speaking 
rate, speech element * 
duration 



S22 
/ 



Extract Voice 
■Quality Features 
from SI: 
shonation type, artt- 
ulation manner, voi- 
ce timbre. 



S23 
/ 



Extract Statisti- 
cal/Spectral 
features from S' 



Detecting, Mo- 
delling, Storing 
Personality Pat^ 
tern PP 




extraction. 



S40 



S50 



Fig. 1 



rator / Dialoge 
Manager 





Speech 




Output 




SO 



6 



EP 1 271 469 A1 




EuropearTPatent 
Office 



EUROPEAN SEARCH REPORT 



Appli c ation Number 

EP 01 11 5216 



DOCUMENT^eONStDEREDTO BE~RELEVAWT 



Category 



Cttatton of document virfth IndtcafJoo. where appropriate, 
of relevant passages • 



Refcrvant 
-to dalrn' 



CLASSIFICATION OF THE 
APPLICATION (IntCtT) 



X 
Y 



US 5 278 943 A (GASPER E10N ET AL) 
11 January 1994 (1994-01-11) 

* abstract * 

*■ column 4, Tine 53 - column 6, line 9 * 

* column 13, line 14 - column 48, line 2 * 

JANET E. CAHN: "The Generation of Affect 
in Synthesized Speech" 

JOURNAL OF THE AMERICAN VOICE I/O SOCIETY, 
*0nl ine! 

vol. 8, July 1990 (1990-07), pages 1-19, 
XP002183399 

Retrieved from the Internet: 

<URL : http : //www. medi a . mi t . edu/{ cahn/raas ter 

s-thesis.ht»> 'retrieved on 2001-11-20! 

* page 3 - page 6 * 

KLASMEYER ET AL: "The perceptual 
importance of selected voice quality 
parameters" 

ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. 
1997. ICASSP-97., 1997 IEEE INTERNATIONAL 
CONFERENCE ON MUNICH, GERMANY 21-24 APRIL 
1997, LOS ALAMIT0S, CA, USA, IEEE COMPUT. 
S0C, US, 

21 April 1997 (1997-04-21), pages 
1615-1618, XP010226301 
ISBN: 0-8186-7919-0 

* abstract * 

W0 99 12324 A (BACK WILLIAM K ;H0LLINS 
JACK (US)) 11 March 1999 (1999-03-11) 

* page 1, line 23 - line 25; claim 1; 
figure 2A * 

-/— 



1,2,11, 

14,15 

3-5 



3-5 



G10H3/02 
G10L13/08 



TECHNICAL FIELDS 

(lnt.Ct.7) 



G10L 



1,11,14, 
15 



The present search report has been drawn cp for ail claims 



Place of 

MUNICH 



Oats erf axnp4ettan of the search 

20 November 2001 



De Vos, L 



CATEGORY OF CTTED DOCUMENTS 

X : partcutarfy relevant if taken alone 

V : particularly relevant H combined with another 

document of the same category 
A : technologies background 
O : r or -written disclosure 
P : Intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the fifing date 
D : document died In the appf cation 
L : document ctted for other reasons 



& : member of the same patent family, corresponding 
documert 



EP 1 271 469 A1 



xiat»^ia.----e--~vr r-» iik. vr<? 




European Patent 
Office 



EUROPEAN SEARCH REPORT 



AppMcatkM Number 

EP 01 11 6216 



DOCUMENTS-C ONSIDERED TO Ofc-REi-EVAffT- 



Category 



Otation of document win Indication. where appropriate. 
of relevant passages ; 



US 6 144 938 A (ALBERT ROY 0 ET AL) 
7 November 2000 (2O0O-11-07) 

* abstract; claims 9,22-25; figures 15,19 

* column 20, line bS - column 2i, line A -4 



Relevant 
to ctalm 



CLASSIFICATION OF THE 
APPLICATION <lntJCt7)- 



1,11,14, 
15 



TECHNICAL FIELDS 
SEARCHED <tnt-CL7) 



The present search report has been drawn upftxall claims 





Ptaceof seeich 








MUNICH 


20 November 2001 


Oe Vos, L 



CATEGORY OF CfTED DOCUMENTS 

X : particularly relevant tt taken alone 

Y : particularly relevant * combined with another 

document of the same category 
A : technological background 
O : nan -written disclosure 
P : intermediate document 



E : earlier patent document but published on. or 

after the ftifrtg date 
O : document dted In the aoplcation 
L : document cited tor other reasons 



& : member of the same patent family, corresponding 
document 



8 



EP 1 271 469 Al 



ANNEX TO THE EUROPEAN SEARCH REPORT 
ON EUROPEAN PATENT APPLICATION NO 



EP 01 11 5216 



This annex lists the patent lamfly members reUrtbig to the patent documents cited In the above-mentioned European search report 

ThflirwnVm T arff?« If »n» F. mpoJ « Patent CM8oa FOP Sa ra ; •- 

The European Patent Office Is In no way liable for those partfcutars which are merely ^ven for the purpose orlnfcirmation. 

20-11-2001 



Patent document 


Publication 


Patent family 


PuWteatlon 


cited In search report 


date 


members) 


date 



US 5278943 
-HO -9912324 



11-01-1994 NONE 



11-03-1999 



US 
W0 



6317486 Bl 
9912324 Al 



13-11-2001 
11-03-1999 



US 6144938 



07-11-2000 



EP 
W0 



1074017 Al 
9957714 Al 



07-02-2001 
11-11-1999 



m For more details about this annex : see Official Journal of the European Patent Office, Mo. 12/82 



9 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record. 



BEST AVAILABLE IMAGES 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 
Lj black borders 



□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 



LINES OR MARKS ON ORIGINAL DOCUMENT 



REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 




□ OTHER: 



